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Abstract. We advocate the use of qualitative models in the analysis of large biological systems. We 
show how qualitative models are linked to theoretical differential models and practical graphical models 
of biological networks. A new technique for analyzing qualitative models is introduced, which is based 
on an efficient representation of qualitative systems. As shown through several applications, this repre- 
sentation is a relevant tool for the understanding and testing of large and complex biological networks. 

1 Introduction 

Understanding the behavior of a biological system from the interplay of its molecular components is a 
>^ ^ particularly difficult task. A model-based approach proposes a framework to express some hypotheses 

j->^ ■ about a system and make some predictions out of it, in order to compare with experimental observa- 

tions. Traditional approaches (see 1 6 1 for an interesting review) include ordinary differential equations 
or stochastic processes. While they are powerful tools to acquire a fine grained knowledge of the sys- 
^ ■ tem at hand, these frameworks need accurate experimental data on chemical reactions kinetics, which 

^ . are scarcely available. Furthermore, they also are computationally demanding and their practical use is 

restricted to a limited number of variables. 

^ O ' As an answer to these issues, many approaches were proposed, that abstract from quantitative details 

^ ■ of the system. Among others, let us stress the work done on gene regulation dynamics Qj hybrid 

qh| systems fTDI or discrete event systems |4|, |3|. The goal of such qualitative frameworks is to enable 

sjjl I system-level analysis of a biological phenomenon. This appears as a relevant answer to recent technical 

k>( ' breakthrough in experimental biology: 

rS 

H . • microarrays, mass spectrometry, protein chips currently allow to measure thousands of variables 

simultaneously, 

• obtained measurements are rather noisy, and may not be quantitatively reliable. 

Microarrays for instance, are used for comparing the activity of genes between two experimental 
settings. A microarray experiment gives differential measure between two experimental settings. It de- 
livers informations on the relative activity of each gene represented on the array. Despite many attempts 
made to quantified the output of microarrays, the essential output of the technique says, for example, 
that a gene G is more active in situation A than in situation B. 



In this paper, we use a framework developed in (25\ for the comparison of two experimental con- 
ditions, in order to derive qualitative constraints on the possible variations of the variables. Our main 
contribution is the use of an efficient representation for the set of solutions of a qualitative system. 
This representation allows to solve systems with hundreds of variables. Moreover, this representation 
opens the way to finer analysis of qualitative systems. This new approach is illustrated by solving three 
important problems: 

• checking the accordance of a qualitative system with qualitative experimental data. 

• minimally correcting corrupted data in discordance with a model 

• helping in the design of experiments 

Our main focus here is to show how to use large qualitative models and qualitative interpretations 
of experimental data. In this respect our work could be used as an extension to what was proposed in 
1^3 1, where basically the authors propose to analyze pangenomic gene expression arrays in E.coli, using 
simple qualitative rules. 

In the first section we establish links between differential, graphical and qualitative models. 

2 Mathematical modeling 

In this section we show how qualitative models can be linked to more traditional differential models. 
Differential models are central to the theory of metabolic control ||9|[n. They also have been applied 
to various aspects of gene networks dynamics. The purpose of this section is to lay down a set of 
qualitative equations describing steady states shifts of differential models. For the sake of completeness, 
we rederive in a simpler case results that have been established in greater generality in 125 . 22| . 

2.1 Modeling assumptions 

Let us consider a network of interacting cellular constituents, numbered from 1 to n. These constituents 
may be proteins, RNA transcripts or metabolites for instance. The state vector X denotes the concentra- 
tion of each constituent. 

Differential dynamics X is assumed to evolve according to the following differential equation: 

where F is an (unknown) nonlinear, differentiable function. A steady state Xgq of the system is a solution 
of the algebraic equation: 

F{Xeq)=0. 

Steady states are asymptotically stable if they attract all nearby trajectories. A steady state is non- 
degenerated if the Jacobian calculated in that steady state is non- vanishing. According to the Grobman- 
Hartman theorem, a sufficient condition to have nondegenerated asymptotically stable steady states is 
Re{Xi) < —C,C > 0,/ = 1, . . . ,n, where A, are the eigenvalues of the Jacobian matrix calculated at the 
steady state. 



Experiment modeling Typical two state experiments such as differential microarrays are modeled as 
steady state shifts. We suppose that under a change of the control parameters in the experiment, the 
system goes from one non-degenerated stable steady state to another one. The output of the two state 
experiment can be expressed in terms of concentration variations for a subset of products, between the 
two states. We suppose that the signs of these variations were proven to be statistically significant. 

Interaction graph The only knowledge we require about the function F concerns the signs of the 
derivatives ^. These are interpreted as the action of the product j on the product /. It is an activation if 
the sign is +, an inhibition if the sign is -. A null value means no action. 
An interaction graph G{V,E) is derived from the Jacobian matrix of F: 

• with nodes V = { 1 , . . . , n} corresponding to products 

• and (oriented) edges E = {(7,01 ax" ^^ ^}- Edges are labeled by s{j,i) = sgn(^). 

The set of predecessors of a node / in G is denoted pred(/). The interaction graph is actually built 
from informations gathered in the literature. In consequence in some places it may be incomplete (some 
interactions may be missing), in others it may be redundant (some interactions may appear several times 
as direct and indirect interactions). It is an important issue that neither incompleteness nor redundancy 
do not introduce inconsistencies and this will be addressed in section |5] 

Negative diagonal in the Jacobian matrix For any product /, we exclude the possibility of vanishing 
diagonal elements of the Jacobian ^. This can be justified by taking into account degradation and 
dilution (cell growth) processes that can be represented as negative self-loops in the interaction graph, 
that is for all /, (/, /) G E and s{i, i) = -. 

Discussion In our mathematical modeling we suppose that the system starts and ends in non-degenerated 
stable steady states. Of course this is not always the case for several reasons: the waiting time to reach 
steady state is too big; one can end up in a limit cycle and oscillate instead of reaching a steady state. All 
these possibilities should be considered with caution. Actually this hypothesis might be difficult to check 
from the two states only. Complementary strategies such as time series analysis could be employed in 
order to assess the possibility of limit cycle oscillations. 

Positive self -regulation is also possible but introduces a supplementary complication. In this case for 
certain values of the concentrations degradation exactly compensates the positive self -regulation and the 
diagonal elements of the Jacobian vanish (this is a consequence of the intermediate value theorem). We 
can avoid dealing with this situation by considering that the positive self -regulation does not act directly 
and that it involves intermediate species. This is a realistic assumption because a molecule never really 
acts directly on itself (transcripts can be auto-regulated but only via protein products). Thus, all nodes 
can keep their negative self-loops and all diagonal elements of the Jacobian can be considered to be non- 
vanishing. Although the positive regulation may imply vanishing higher order minors of the Jacobian, 
this will not affect our local qualitative equations. 

2.2 Quantitative variation of one variable 

We focus here on the variation of the concentration of a single chemical species represented by a com- 
ponent Xi of the vector X. Since we have adopted a static point of view, we are only interested in 
the variation of X,- between two non-degenerated stable steady states X} and X^ independently of the 
trajectory of the dynamical system between the two states. 

Let us denote by X, the vector of dimension n, obtained by keeping from X all coordinates j that 
are predecessors of / in the interaction graph. Then, under some additional assumptions described and 
discussed in l22l . we have the following result: 



Theorem 2.1 

The variation of the concentration of species i between two non-degenerated steady states X} andX^ is 
given by 

Js \dXJ ^gp,,d(;) CA/, 

where S is the segment linking Xj to X^ . 

Full proof is given in |22|. The above formula is a quantitative relation between the variation of 
concentrations and the derivatives -^. Now our next move will be to introduce a qualitative abstraction 
of this relation. 

2.3 Qualitative equations 

We propose here to study Eq. ^in sign algebra. By sign algebra, we mean the set {+,-,?}, where ? 
represents undetermined sign. This set is provided with the natural commutative operations: 

++-=? +++=+ -+-=- +x-=- +x+=+ -x-=+ 
?+_=? ?++=? ?+?=? ?x-=? ?x+=? ?x?=? 

Equality in sign algebra w is defined as follows: 



^ + - ? 
+ T F T 
~ F T ^ 



Importantly, qualitative equality is not an equivalence relation, since it is not transitive. This implies 
that computations in qualitative algebra must be carried with care. At least two major properties should 
be emphasized: 

• if a term of a sum is indeterminate (?) then the whole sum is indeterminate. 

• if one hand of a qualitative equality is indeterminate, then the equality is satisfied whatever the 
value of the other hand is. 

A qualitative system is a set of algebraic equations with variables in {+,-,?}. A solution of this 
system is a valuation of the unknowns which satisfies each equation, and such that no variable is instan- 
tiated to ?. This last requirement is important since otherwise any system would have trivial solutions 
(like all variables to ?). 

Theorem 2.2 

Under the assumptions and notations of Theorem \2.1\ if the sign of -^ is constant, then the following 
relation holds in sign algebra: 

s{AXi)^ ^ s{k,i)s{AXk) (2) 

kEpred{i) 

where s{AXk) denotes the sign ofX} —X^q^- 

By writing Eq. |2lfor all nodes in the graph, we obtain a system of equations on signs of variations, 
later referred to as qualitative system associated to the interaction graph G. This will be used extensively 
in the next sections. 



2.4 Link between qualitative and quantitative 

The qualitative system obtained from Eq|2lis a consequence of the quantitative relations that result from 
Theorem 12. II So the sign function maps a quantitative variation between two equilibrium points onto a 
qualitative solution of Eq|2l The converse is not true in general. For a given solution S of the qualitative 
system, there might be no equilibrium change AX in the differential quantitative model, s.t. each real- 
valued component of AX has the sign given by S. 

However, some components of the solution vectors are uniquely determined by the qualitative sys- 
tem. They take the same sign value in every solution vector. For such so-called hard components, the 
sign of any quantitative solution (if it exists) is completely determined by the qualitative system. 

We will use the previous properties to check the coherence between models and experimental data. 
By experimental data we mean the sign of the observed variation in concentration for some nodes. In 
particular, if the qualitative system associated to an interaction graph G has no solution given some ex- 
perimental observations, then no function F satisfying the sign conditions on the derivatives can describe 
the observed equilibrium shift, meaning that either the model is wrong, either some data are corrupted. 
In the next section, we introduce a simphfied model related to hpid metabolism, and illustrate the above 
described formalism. 

3 Toy example: regulation of the synthesis of fatty acids 

In order to illustrate our approach, we use a toy example describing a simplified model of genetic regu- 
lation of fatty acid synthesis in liver. The corresponding interaction graph is shown in Fig. [2 

Two ways of production of fatty acids coexist in liver. Saturated and mono-unsaturated fatty acids 
are produced from citrates thanks to a metabolic pathway composed of four enzymes, namely ACL (ATP 
citrate liase), ACC (acetyl-Coenzyme A carboxylase), FAS (fatty acid synthase) and SCDl (Stearoyl- 
CoA desaturase 1). Polyunsaturated fatty acids (PUFA) such as arachidonic acid and docosahexaenoic 
acid are synthesized from essential fatty acids provided by nutrition; DSD (Delta-5 Desaturase) and D6D 
(Delta-6 Desaturase) catalyze the key steps of the synthesis of PUFA. 

PUFA plays pivotal roles in many biological functions; among them, they regulate the expression 
of genes that impact on lipid, carbohydrate, and protein metabolism. The effects of PUFA are medi- 
ated either directly through their specific binding to various nuclear receptors (PPARa - peroxisome 
proliferator activated receptors, LXRa - Liver-X-Receptor a, HNF-4a) leading to changes in the trans- 
activating activity of these transcription factors; or indirectly as the result of changes in the abundance 
of regulatory transcription factors (SREBP-lc - sterol regulatory element binding-protein-, ChREBP, 
etc.) fT3\ . 

Variables in the model We consider in our model nuclear receptors PPARa, LXRa, SREBP-lc (de- 
noted by PPAR, LXR, SREBP respectively in the model), as they are synthesized from the correspond- 
ing genes and the trans-activating active forms of these transcription factors, that is, LXR-a (denoting 
a complex LXRa:RXRa), PPAR-a (denoting a complex PPARa:RXRa) and SREBP-a (denoting the 
cleaved form of SREBP-lc. We also consider SCAP - (SREBP cleavage activating protein), a key en- 
zyme involved in the cleavage of SREBP-lc, that interacts with another family of proteins called INSIG 
(showing the complexity of molecular mechanism). 

We also include in the model "final" products, that is, enzymes ACL, ACC, FAS, SCDl (implied in 
the fatty acid synthesis from citrate), DSD, D6D (implied in PUFA synthesis) as well as PUFA them- 
selves. 

Interactions in the model Relations between the variables are the following. SREBP-a is an activator 
of the transcription of ACL, ACC, FAS, SCDl, DSD and D6D L20iil3i. LXR-a is a direct activator of the 
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Figure 1: Interaction graph for the toy model. Self -regulation loops on nodes are omitted for sake of 
clarity. Observed variations are depicted next to each vertex, when available. 



transcription of SREBP and FAS, it also indirecdy activates ACL, ACC and SCDl |26 1. Notice that these 
indirect actions are kept in the model because we don't know whether they are only SREBP-mediated. 

PUFA activates the formation of PPAR-a from PPAR, and inhibits the formation of LXR-a from LXR 
as well as the formation of SREBP-a (by inducing the degradation of mRNA and inhibiting the cleavage) 
fT3\. SCAP represents the activators of the formation of SREBP-a from SREBP, and is inhibited by 
PUFA. 

PPAR directly activates the production of SCDl, DSD, D6D |l9l|27|[T8l. The dual regulation of 
SCDl, DSD and D6D by SREBP and PPAR is paradoxical because SREBP transactivates genes for 
fatty acid synthesis in liver, while PPAR induces enzymes for fatty acid oxidation. 
Hence, the induction of DSD and D6D gene by PPAR appears to be a compensatory response to the 
increased PUFA demand caused by induction of fatty acid oxidation. 



Fasting-refeeding protocols The fasting-refeeding protocols represent a favorable condition for study- 
ing lipogenesis regulation; we suppose that during an experimentation, animals (as rodents or chicken) 
were kept in a fasted state during several hours. Then, hepatic mRNA of LXR, SREBP, PPAR, ACL, 
FAS, ACC and SCDl are quantified by DNA microarray analysis. Biochemical measures also provide 
the variation of PUFA. 

A compilation of recent literature on lipogenesis regulation provides hypothetical results of such 
protocols: SREBP, ACL, ACC, FAS and SCDl dechne in hver during the fasted state |l7l. This is 
expected because fasting results in an inhibition of fatty acid synthesis and an activation of the fatty 



acid oxidation. For the same reason, PPAR is increased in order to trigger oxidation. However, Tobin 
et al (1 28 1) showed that fasting rats for 24h increased the hepatic LXR mRNA, although LXR positively 
regulates fatty acid synthesis in its activated form. Finally, PUFA levels can be considered to be increased 
in liver following starvation because of the important lipolysis from adipose tissue as shown by Lee et 
al in mice after 72h fasting (L15.I). 

Qualitative system derived from the graph As explained in the previous section, we derive a quali- 
tative system from the interaction graph shown in Fig. ^ For ease of presentation, we denote by A the 

sign of variation for species A. 
System 1 

(1) PPAR-a = PPAR + PUFA ^, 

(2) LXR-a = -PUFA + LXR Observations! 

(3) SREBP = LXR-a ^^^^ "^ 

PUFA = + 

(4) SREBP-a = SREBP + SCAP -PUFA 

LXR = + 

(5) ACL = LXR-a + SREBP-a - PUFA 

SREBP = — 

(6) ACC = LXR-a + SREBP-a - PUFA 

APT = — 

(7) FAS = LXR-a + SREBP-a - PUFA 

ACC = - 

(8) SCDl = LXR-a + SREBP-a - PUFA + PPAR-a 

FA9 = - 

(9) SCAP = -PUFA 

SCDl = - 

(10) D5D = PPAR-a + SREBP-a - PUFA 

(11) D6D = PPAR-a + SREBP-a - PUFA 

In the next section, we propose an efficient representation for such qualitative systems. 

4 Analysis of qualitative equations: a new approach 

4.1 Resolution of qualitative systems 

The resolution of (even linear) qualitative systems is a NP-complete problem (see for instance f29"8l). 
One can show this by reducing the satisfiability problem for a finite set of clauses to the resolution of a 
qualitative system in polynomial time. 

Let us consider a collection C = {ci , . . . ,c„} of clauses on a finite set V of variables. Let {+,-, ?} 
a sign qualitative algebra. In order to reduce the satisfiability problem to the resolution of a qualitative 
system, let us code true into + and false into — . If c is a clause, let us denote by c the encoding of c in 
a qualitative algebra formula. The following encoding scheme provides a polynomial procedure to code 
a clause into a qualitative formula. : 

clause sign algebra 



a G y — > a 

CiVC2 — > C1+C2 

-ic — > —c 



The satisfiability problem for the set of clauses C is then reduced to finding a solution of the quali- 
tative system: 

{ci!^ + / i= l,...,n} 

So a NP-complete problem can be reduced to the resolution of a qualitative system in polynomial time 
(with respect to the size of the problem). This shows that solving qualitative systems is a NP-complete 
problem. For example, the only pair of values which are not solution of —d + b ^ + are (+, — ). This 
corresponds to the only pair {true, false) that does not satisfy ^aV b. 

Several heuristics were proposed for the resolution of qualitative systems. For linear systems, set 
of rules have been designed |8|. This set is complete: it allows to find every solution. It is also sound: 
every solution found by applying these rules is correct. The rules are based on an adaptation of Gaussian 



elimination. However only heuristics exist for choosing the equation and the rule to apply on it. In case 
of a dead-end, when no more rule can apply, it is necessary to backtrack to the last decision made. As a 
result programs implementing qualitative resolution are not very efficient in general and only problems 
of small size can be resolved in reasonable time. For that reason we propose an alternate way to solve 
qualitative systems (linear or not). 

4.2 Qualitative equation coding 

Our method is based on a coding of qualitative equations as algebraic equations over Galois fields Z/pZ 
where p is a. prime number greater than 2. The elements of these fields are the classes modulo p of the 
integers. If x denotes the class of the integer x modulo p, a sum and a product are defined on Z/pZ as 
follows: 



sign algebra 


Z/3Z 


€[+€2 


> -e^.e^.{e^ + e^) 


^1 X ^2 


> ej.e^ 


^1 ~ ^2 — 


> Ff .^2 • (^ ~ ^) 



x + y =x + y xxy =xxy 
Galois fields have two basic properties which we use extensively: 

• Every function / : {'L/p'LY ~^ '^/pl' with n arguments Z/pZ is a polynomial function 

• if © denotes the operation f®g = f^^^'^' +g'^^'s then every equation system p\ {X) = 0, . . . ,Pk{X) 
has the same solutions than the unique equation pi® P2® ■■■® Pk{X) = 0. 

The following table specifies how the sign algebra {+,-,?} is mapped onto the Galois field with 
three elements Z/3Z is used for that coding. 

sign algebra Z/3Z 

+ ~ I 

? ^0 

Finally a qualitative system {ei, . . . ,e„} is coded as the polynomial ^® • • ■ ©e^. A similar coding 
for the qualitative algebra {+,-,0,?} uses the Galois field Z/5Z and will not be presented here. 

With this coding, every qualitative system has a solution if and only if the corresponding polynomial 
has a solution without null component. Null solutions are excluded since ? solutions are excluded for 
qualitative systems. In general we will have to add polynomial equations X^ = 1 to insure this. 

4.3 An efficient representation of polynomial functions 

Recall that our purpose is to efficiently solve a NP-complete problem. There is no hope to find a rep- 
resentation of polynomial functions allowing to solve polynomial systems of equations in polynomial 
time. The coding of a qualitative system as a polynomial equation is obviously polynomial in the size 
of the system (number of variables plus number of equations). So finding the solution of a polynomial 
system of equations is itself a NP-complete problem. It is more or less the SAT problem. 

Nevertheless, there exists a representation of polynomial functions on Galois fields which gives, in 
practice, good performances for polynomials with hundreds of variables. This kind of representation was 
first used for logical functions which may be considered as polynomial functions over the field Z/2Z. 
This representation is known as BDD (Binary Decision Diagrams) and is widely used in checking logical 
circuits ||2| and in model checkers as nu-SMV |5 1. 

We present here this representation for the field Z/3Z. Generalizations to other Galois fields could 
be treated as well. The starting point is a generalization of Shannon decomposition for logical functions: 

p{XuX) = (1 -Xf)p^x,^o]iX)+Xi{-X, -Xf)p^x,^,]{X)+Xi{Xi -Xf)p^x,^2]iX) 



where p is a. polynomial function with n variables. This decomposition leads to a tree representation of 
the polynomial function: the variable Xi is the root and has three children. Each of these is obtained by 
instantiating Xi to -1, or 1 in p{Xi,X). This representation is exponential (3") as each non constant 
node has 3 children. It also depends on a chosen order on the variables. 

Then a key observation (see |2|), is that several subtrees are identical. They have the same variable 
as root variable and isomorphic children. If we decide to represent only once each type of tree, then the 
tree representation is transformed into a direct acyclic graph. With this representation there is no more 
redundancy among subtrees. The result may be a dramatic decrease in the size of the representation of a 
polynomial function. 





Figure 2: From tree representation to direct acyclic graph for X^(F + 1). The tree has 13 nodes while 
the DAG representing the same function has 5 nodes. 



A property of the Shannon like decomposition is that many operations on polynomial functions are 
recursive with respect to this decomposition. More precisely let 

p'{X,,X) = (1 -Xf)p'o{X) +Xi(-Xi -Xf)p[{X) +X,{X, -Xl)p'2{X) 



i = 1,2 be two polynomial functions with Pa{X) 
A on polynomial functions, 



P[Xi=a]i^) ! « = 0, 1,2. Then for binary operations 



piA/ = (1 -XfXp'oApl) +Xi(-Xi -Xf){p\Apj)+X,{Xi -Xf){p\Apl) 

This kind of recursive formula leads to an exponential complexity of any computation. Again, it is pos- 
sible to take advantage of the redundancy by using a cache to remember each operation. This technique 
is known as memoisation in formal calculus. A 40% cache hit rate is commonly observed. 

More complex operations on polynomial functions are also implemented with a recursive scheme 
and memoisation. Let us just mention quantifier elimination as among the most useful for our purpose. 

This representation of polynomial functions on Galois fields has also several drawbacks: 

• the memory size heavily depends on the order of variables. The libraries implementing formal 
computations always have reordering algorithms. 

• for each order, there exists polynomial functions which are exponential in memory size. 

Nevertheless, in practice, this representation has proved to be very efficient for polynomial functions 
with several hundred of variables. The computations performed on our toy model and on another real size 
one used a program named SIGALI which is devoted to polynomial functions on Z/3Z representation. 
Several algorithms were added to this program in order to answer questions of biological interest. 



5 Qualitative models and experimental data 

In this section, we show how to compute some properties of a quaUtative system, and eventually get some 
insights on the biological model it represents. The algorithms we derive heavily rely on the represen- 
tation introduced above. Hence, not only they can deal in practice with computationally hard problems 
efficiently, but also they are expressed in a rather simple and generic fashion. 

Let M be a qualitative model represented by its associated interaction graph G{V,E). Recall that V 
is the set of variables. Let Vq be the set of observed variables, and o,- € {+, -} for / € Vq the experimental 
observations. As explained in the previous section, the qualitative system derived from M can be coded 
as a polynomial function Pm{Xi ,... ,X„). Roots of Pm correspond to solutions of the qualitative system. 

5.1 Satisfiability of the qualitative system 

A property of the coding described above, is that the system has no solution iff Pm is equal to the constant 
polynomial 1. Alternatively if Pm = 0, the qualitative equations do not constraint the variables at all. 

Now if some observations o,- for / G Vq are available, checking their consistency with the model M 
boils down to instantiating X,- = o,- in Pm{X\,...,X„), for all i ^Vq, and testing whether the resulting 
polynomial is different from 1. 

We computed the polynomial Pl associated to our toy example (see section |3ll and it has roots. 
Recall that it does not guarantee the existence of some (quantitative) differential model conforming to 
the interaction graph depicted in Fig. [2 Satisfiability of the qualitative system is only a necessary 
condition for the model to be correct. 

The polynomial obtained by instantiating observations into Pl is different from 1, meaning that our 
model does not contradict generally observed variations during fasting. 

Large size models might advantageously be reduced using standard graph techniques. First we look 
for connected components in the interaction graph. A graph with several connected components repre- 
sents a coherent qualitative model iff each component is coherent. Second, a node without successor 
except itself appears only in its associated equation. If this node is not observed, its associated qualitative 
equation adds no constraint on the other nodes. So, at least for satisfiability checking, this node can be 
suppressed and its qualitative equation removed from the system. This procedure is applied iteratively, 
until no node can be deleted. The resulting graph leads to a new qualitative system which is satisfiable 
iff the initial system is satisfiable. 

5.2 Correcting data or model 

If the qualitative system, given some experimental observations, is found to have no solution, it is of 
interest to propose some correction of the data and/or the model. By correction, we mean inverting the 
sign of an observed variable or the sign of an edge of the interaction graph. In the general case, there 
are several possibilities to make the system satisfiable, and we need some criterion to choose among 
them. We applied a parsimony principle: a correction of the data should imply a minimal number of 
sign inversions. 

In the following, we show how to compute all minimal corrections for the data. Given (o,),eVQ a 
vector of experimental observations which is not compatible with the model, we compute all (o^igVo 
vectors which are compatible with the data and such that the Hamming distance between o and o' is 
minimal. By Hamming distance, we mean the number of differences between o and o'. The set of such 
o' vectors might be very large; but again, by encoding it as the set of roots of a polynomial function, we 
obtain a compact representation. 

This procedure can be extended in a straightforward manner to corrections of edges sign in the inter- 
action graph. This is done by considering these signs as variables of the model. For ease of presentation, 
we only detail data correction. 
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Input: 

P, a polynomial function on variables V 
i(zV 
Output: 

C, a polynomial function encoding all minimal corrections 
d, minimal number of corrections 

if P is constant then 
ifP = Othen 

Result: C = 0, <i = 
else 

Result: C= l,<i = °o 

end 

else 

let Pq, Pi, P2 be the Shannon decomposition of P with respect to variable Xj, 

and {Cj,dj) the result obtained by recursively applying the algorithm on P/ and /+ 1 for y G {0, 1,2} 

Md' = i ''j^^ if/eVoando,-^; ^nj C = ^ ^^' "-^^ ®'^^' '^'^^o 



•' 1 dj otherwise J 1 Cj otherwise 

Result: d = min d'j, C= Yl ^J 

J. d'j=d 

end 

Algorithm 1: Algorithm for experimental data correction. 



Let us illustrate this algorithm on our toy example: during fasting experiments, synthesis of fatty 
acids tends to be inhibited, while oxidation, which produces ATP, is activated. In particular ACC, ACL, 
FAS and SCDl are implied in the same pathway to produce saturated and monounsaturated fatty acids. 
Expectedly, they are known to decline together at fasting. Suppose we introduce some wrong obser- 
vation, say for instance an increase of ACL, while keeping all other observations given above. The 
polynomial obtained from P^ including these new observations is equal to 1, and hence has no solution. 
Applying algorithm ^ we recover this error. Now if we wrongly change two values, say ACL and ACC 
to 1, the algorithm proposes a different coiTcction, namely to change the observed value of SREBP to 1, 
which is more parsimonious. 

5.3 Experiment design 

It is often the case that not all variables in the system under study can be observed. Biochemical mea- 
surements of metabolites can be costly and/or time consuming. By experiment design, we mean here 
the choice of the variables to observe so that an experiment might be informative. 

Let Pm{Xo,Xij) be the polynomial function coding for the qualitative system M. Xq (resp. Xy) 
denotes the state vector of observed (resp. unobserved) variables. The polynomial function repre- 
senting the admissible values of the observed variables is obtained by elimination of the quantifier in 
3Xu Pm{Xo,Xu). Let P^{Xo) denote the resulting polynomial function. 

For some choice of observed variables, it might well be that P^ is null, which basically means that 
the experiment is totally useless. Remark that no improvement can be found by taking a subset of Xq 
The solution is either to add new observed variables or to chose a completely different set of observed 
variables. 

In order to assess the relevance of a given experiment (namely of a given observed subset), we 
suggest to compute the following ratio: number of consistent valuations for observed variables versus 
the total number of valuations of observed variables. A very stringent experiment has a low ratio. An 
experiment having a ratio value of one is useless. 

Again this computation is carried out in a recursive fashion. Let P be a polynomial function rep- 



11 



resenting the set of admissible observed values. Let Rat{p) the percentage of solutions of P{X) = 
in the space (Z//7Z)", where n is the number of variables X. If P is constant then Rat{P) = 1 (resp. 
Rat{P) = 0) if P = (resp. P / 0). Else, let Pi, P2, P3 be a Shannon like decomposition of P{X) with 
respect to some variable of P. Then it is easy to prove: 

Rat{P) = {Rat{Po)+Rat{Pi)+Rat{P2))/3 

The relevance of this approach was assessed on our toy example: for each subset O of variables in 
the model, containing at most four variables, we computed Rat{Pj^). Expectedly, the lowest ratios (i.e. 
the most stringent experiments) were achieved observing four variables: either {SCAP, PUFA, PPAR-a, 
PPAR}, or {SREBP, SCAR PUFA, LXR-a}, or {SREBR PPAR-a, PPAR, LXR-a}. 

Interestingly, the procedure captures what might be though of as control variables, like PUFA/SCAP, 
SREBP/LXR-a and PPAR/PPAR-a. The first two pairs control the activation of fatty acids synthesis; the 
third one controls fatty acid oxidation. 

Indeed one can go even further: if we isolate some kind of control variables, we are naturally inter- 
ested in knowing how they constrain other variables. Achieving this amounts to computing the set of 
variables which value is constant for all solutions of the system (the so called hard components). Recall 
that these hard components of qualitative solutions are also important with respect to the hypothetical 
differential model which is abstracted in the qualitative one. Indeed, all solutions of the quantitative 
equation for equilibrium change have the same sign pattern on the hard components. Algorithm |2l de- 
scribes a recursive procedure which finds the set of hard components, together with their value. 

Input: P, a polynomial function on variables V 
Output: 

the set W C V x {0, 1 , 2} of hard components, together with their values 
a boolean b which is true if P has at least one root 

if P is constant then 
ifP = Othen 

return {(d,true) 
else 

return {(djalse) 
end 

else 

let Pq, Pi, P2 be the Shannon decomposition of P with respect to variable Xj, 

and {Wj,bj) the result obtained by recursively applying the algorithm on Py for j G {0,1,2} 

let W = {(v, v')|v G y, y' G {0, 1,2}, V; bj =^ (v, v') G Wj} 
it there exists a unique jo s.t. bj^ is true then 

add(/,;o)toW 
end 

return (W,Z7oVfeiV/72) 
end 

Algorithm 2: Determination of hard components 

Let us set some of our previously found control variables of the toy example, to a given value, 
say PUFA to 1, and LXR to -1. Then applying the algorithm |2j the corresponding polynomial has the 
following hard components: 



ACL = -1 


FAS = -1 


ACC = -1 


LXR-a = -1 


SCAP = -1 


SREBP = -1 


SRF.RP-a = -1 


PPAR = -1 


PPAR-a = -1 





which expectedly corresponds to the inhibition of fatty acids synthesis. 
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5.4 Real size system 

We have used our new technique to check the consistency of a database of molecular interactions in- 
volved in the genetic regulation of fatty acid synthesis. In the database, interactions were classified as 
behavioral or biochemical. 

• a behavioral interaction describes the effects of a variation of a product concentration. It is either 
direct or indirect (unknown mechanism). 

• a biochemical interaction may be a gene transcription, a reaction catalyzed by an enzyme ... Such 
molecular interactions can be found in existing databases. They need a behavioral interpretation. 

All the behavioral interactions were manually extracted from a selection of scientific papers. Biochem- 
ical interactions were extracted from public databases available on the Web (Bind d, IntAct IT2II . 
Amaze 1 16|, KEGG |21 1 or TransPath |24||). A biochemical interaction may be linked to a behavioral 
interpretation in the database. 

The database is used to generate the interaction graph. While behavioral interactions directly corre- 
spond to edges in the graph, biochemical interactions are given a simplified interpretation. Roughly, any 
increase of a reaction input induces an increase of the outputs. 

The interaction graph which is built from the database contains more than 600 vertices and more than 
1400 edges. It is clear that even though, the obtained graph is not a comprehensive model of genetic 
regulation of fatty acid synthesis in liver. Anyway our aim is to see how far this model can account for 
experimental observations, and propose some corrections when it cannot. 

We used our technique to check the coherence of the whole model. After reducing the graph with 
standard graph techniques as described in section ISlTI we found that the model was incoherent. The re- 
duced graph has about 150 nodes. We developed a heuristic to isolate minimal incoherent sub-systems. 
It turned out that all the contradictions we detected resulted from arguable interpretations of the litera- 
ture. 

6 Conclusion 

In this paper we proposed a qualitative approach for the analysis of large biological systems. We rely on 
a framework more thoroughly described in [25 1, which is meant to model the comparison between two 
experimental conditions as a steady state shift. This approach fits well with state of the art biological 
measurement techniques, which provide rather noisy data for a large amount of targets. It is also well 
suited to the use of biological knowledge, which is most of the time descriptive and qualitative. 

This qualitative approach is all the more attractive that we can rely on new analysis methods for 
qualitative systems. This new technique is also introduced in this paper and is original in qualitative 
modeling. It relies on a representation of qualitative constraints by decision diagrams. Not only this has 
a major impact on the scalability of qualitative reasoning, but it also permits to derive many algorithms 
in a quite generic fashion. 

We plan to validate our approach on pathways which are published for yeast and E. Coli. Not only 
this pathways are of significant size but microarray data for this species are publicly available. Concern- 
ing the scalability of the methods, qualitative systems with up to 200 variables are handled within a few 
minutes. 

On the theoretical side, we study applications of our algebraic techniques to network reconstruction, 
as proposed in l30l . The problem is to infer direct actions between products, based on large scale 
perturbation data, in order to obtain the most parsimonious interaction graph. Our approach could lead to 
a reformulation of this problem in terms of polynomial operations. Indeed, finding a minimal regulation 
network from a minimal polynomial representation has already been described in [14J , though it was 
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applied to a rather different type of network. A similar approach tailored to the framework described in 
this paper could eventually lead to original and practical algorithms for network reconstruction. 
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